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Abstract 

The  level  of  semantic  data  interoperability  between  a  source  and  a  receiver  is  a 
function  of  the  context  interchamge  mech£inism  that  operates  between  the  source  and 
the  receiver.  The  senicintic  interoperability  mechanisms  in  existing  systems  are  usually 
static  in  nature  and  cjinnot  cope  up  with  changes  in  the  semcintics  of  data  either  at  the  . 
source  or  at  the  receiver.  In  this  paper,  we  propose  a  context  interchcinge  mechanism, 
based  on  lattice  theory  which  can  handle  changes  in  the  semantics  of  data  at  both  the 
source  cind  the  receiver.  A  site-copy  selection  algorithm  is  also  presented  in  this  paper 
which  selects  the  set  of  sources  that  can  supply  semantically  mecmingful  data  to  the 
query  of  a  particular  source. 

Keywords:  semantic-interoperability,  context-interchange,  lattice  theory, 
and  query-processing. 

1      Introduction 

It  would  not  be  an  exaggeration  to  claim  that  data  required  for  any  application  are  available 
at  some  data-source  in  the  world.  Data  in  such  data-sources  might  have  different  context 
than  the  application  context.  If  it  is  possible  to  convert  the  context  of  data  in  the  data-source 
to  the  context  of  an  application,  then  it  is  less  expensive  to  reuse  these  data  than  collecting 
the  required  data  for  an  application  from  the  scratch.  Research  on  data  communications  has 
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concentrated  on  the  physical  obstacles  to  the  reliable  transfer  of  data  between  a  source  and 
a  receiver,  but  rarely  on  issues  such  as  the  mismatch  between  the  context  of  data-source  and 
the  context  of  the  data-receiver. 

What  exactly  constitutes  the  context  is  difficult  to  answer  [Lyo81].  The  concept  of 
context  has  been  addressed  in  many  areas  such  as  sensory  process,  perception,  language, 
concept  learning,  reccJl  and  recognition  [Bur52,  Coe77,  Tho88].  The  main  reason  for  the 
context  assuming  a  central  role  in  these  areas  is  that  objects  and  their  associated  events 
constitute  an  integral  part  of  their  environment  and  cannot  be  understood  in  isolation  of  that 
environment.  In  this  paper  we  do  not  attempt  to  give  precise  definition  for  this  term,  even 
though  this  is  part  of  our  long  term  research  objective.  We  assume  that  context  knowledge 
of  a  data  item  is  a  triple  given  by  the  semantic  knowledge  of  the  data,  the  organization  of 
the  data,  and  the  quality  parameters  of  the  data.  In  this  paper,  we  concentrate  only  on  the 
semantic  component  of  the  context,  which  is  formedly  defined  in  Section  3. 

Consider  the  process  by  which  a  financial  analyst  accesses  the  prices  for  shares  of  a 
particular  company.  He  or  she  needs  to  gather  information  from  severed  stock  exchanges 
located  in  different  nations  and  must  overcome  semantic  discrepencies  at  multiple  levels:  the 
stock  prices  are  stated  in  different  currencies,  the  currencies  are  floating  with  respect  to  each 
other;  the  stock  price  may  be  the  latest-price  or  the  closing-price;  etc.  Such  semantics  are 
implicit  in  many  existing  databases.  Unless  these  semantics  are  made  explicit,  it  is  difficult 
to  identify  and  resolve  underlying  semamtic  incompatibilities.  The  fundamental  question  is 
how  to  make  such  semantics  explicit  and  how  to  quickly  identify  the  incompatibilities  and 
resolve  them  if  possible. 

A  number  of  researchers  [LR82,  Tem89,  DAT87,  Ke88,  RJPS89,  BT84]  have  proposed 
solutions  that  aim  to  identify  semantic  incompatibilities  during  the  process  of  schema  inte- 
gration. In  this  scenario,  every  application  defines  its  own  views  on  the  integrated  schema 
and  these  views  are  used  to  translate  the  semantics  of  the  integrated  schema  into  application 
semantics.  In  this  approach,  the  semsmtics  of  data  in  a  database  are  first  tratnslated  into  the 
semantics  of  the  integrated  schema  and  then  translated  into  the  semantics  of  the  applica- 
tion.  This  strategy  is  expensive  if  any  changes  occurs  in  the  semantics  of  the  data  that  is 


being  integrated.  In  [SM91],  a  rule-based  approach  was  proposed  in  which  these  semantic 
differences  are  dynamically  identified  and  resolved  using  the  context  information  associated 
with  the  data  items  required  by  an  application.  In  this  approach,  the  semantics  of  data  in  a 
database  can  be  directly  translated  into  the  semantics  of  the  application.  In  this  paper,  we 
extend  this  model  with  a  lattice-based  representation  for  the  context  knowledge.  We  believe 
that  the  lattice  representation  is  more  natural  for  representing  the  context  knowledge  and 
for  cross  comparison. 

Our  context  interchange  scenario  is  shown  in  Figure  1.  This  scenario  consists  of  a  set 
of  applications,  a  set  of  data  sources,  and  a  context  mediator.  Each  application  or  data- 
source  is  associated  with  a  Context  Data  Repository  (CDR),  which  explicitly  specifies  the 
context  of  each  concept  relevant  to  it  from  its  point  of  view.  Whenever  an  application 
requests  for  data,  the  context  mediator  ensures  that  the  application  receives  semantically 
meaningful  data  from  various  data  sources  (if  it  is  possible).  As  such,  the  context  mediator 
must  possess  the  capability  to  reason  with  different  contexts  (i.e.,  with  different  CDRs) 
simultaneously.  The  representation  scheme  of  context  in  each  CDR  can  enable  the  context 
mediator  to  simultaneously  reason  with  different  CDRs.  This  imposes  a  need  for  systematic 
description  of  context  knowledge  of  application  requirements  and  the  context  knowledge 
of  data-sources;  we  have  adopted  the  lattice  model  for  such  representation  based  on  the 
following  considerations: 

•  A  databzise  can  supply  meaningful  data  to  an  application  provided  the  database  context 
is  more  general  than  the  application  context.  The  more  general  than  relation  can  be 
elegantly  modeled  using  the  Lattice  model.  In  a  lattice,  the  context  becomes  more 
specific  as  one  moves  upward  and  more  generalized  as  one  moves  downward. 

•  Context  knowledge  can  be  economically  represented  in  the  form  of  a  lattice  as  the 
knowledge  present  at  a  pcirticular  context,  which  can  be  shared  by  all  contexts  which 
are  more  specific  than  that  context. 

•  Reasoning  with  context  knowledge  in  the  form  of  a  lattice  is  economical  because  an 
application  receives  meaningful  solution  from  all  data  sources  which  are  in  a  more 


generalized  context  than  the  application,  and  the  more  general  than  relation  is  easy  to 
trace  in  a  lattice. 

Within  the  lattice  based  context  interchange  framework,  conditions  under  which  a  data 
source  can  supply  semanticaJly  meaningful  data  to  address  an  application's  data  requirements 
are  derived  in  this  paper.  We  have  proposed  a  context  driven  site-copy  selection  algorithm, 
which  selects  various  candidate  data-sources  that  can  supply  meaningful  data  to  queries 
initiated  by  different  applications. 

This  paper  is  organized  as  follows:  Section  2  presents  a  brief  overview  of  representation 
schemes  for  the  context  knowledge.  The  approach  selected  in  this  paper  for  representing 
context  knowledge  is  presented  in  Section  3.  In  Section  4,  a  mechanism  to  describe  context 
data  repository  for  databeises  and  their  applications  is  discussed.  A  procedure  for  context 
mediation  is  presented  in  Section  5.  Query  processing  and  associated  issues  are  discussed  in 
Section  6.  The  last  section  contains  our  conclusions. 

2      Lattice-based  Representation  of  Context 

The  concept  of  context  has  been  treated  from  different  perspective  by  various  researchers. 
For  example,  the  Truth  Maintenance  System  (TMS)  [Doy89]  <ind  the  Assumption-based 
Truth  Maintenance  System  (ATMS)[dK86a]  have  adopted  different  representation  schemes 
for  representing  the  notion  of  context.  In  TMS,  each  datum  is  labeled  either  IN  or  OUT, 
as  determined  by  the  given  boundary  condition.  The  notion  of  context  is  implicit  in  the 
boundary  conditions  and  aU  the  data  items  which  are  believed  to  be  true  in  that  context 
are  labelled  IN  and  all  the  data  items  which  are  not  believed  to  be  true  in  that  context 
are  labelled  OUT.  The  context  changes  only  when  one  of  the  boundairy  conditions  changes. 
As  such,  TMS  maintains  only  one  context  (global  context)  at  any  given  point  of  time. 
In  contrast,  ATMS  provides  all  the  contexts  in  which  the  data  item  are  believed  and  can 
represent  multiple  contexts  simultaneously.  The  computational  advantages  of  ATMS  are 
discussed  in  [McD83,  dK86a,  dK86b,  dK86c].  The  context  interchange  problem  is  somewhat 
closer  to  that  of  ATMS  as  the  context-mediator  needs  to  interact  and  reason  with  multiple 


Figure  1:  Context  Interchange  Scenario 


contexts  simultaneously. 

2.1      Notation  and  Definitions 

Let  X  and  Y  represent  the  context  knowledge  about  a  concept  from  two  different  viewpoints. 
Five  possible  relations  can  exist  among  X  and  Y,  as  follows: 

(i)  X  =  Y  ;  X  and  Y  are  in  the  same  context. 

(ii)  X  C  Y  ;  X  is  a  more  generalized  context  than  Y. 

(iii)  X  D  Y  ;  X  is  a  more  specific  context  than  Y. 

(iv)  X  n  Y    7^   0  and  (i),  (ii),  or  (iii)  are  not  satisfied;  X  and  Y  possess  some  context 
in  common. 

(v)  X  n  Y  =  0  ;  X  and  Y  are  totally  disjoint  contexts. 

Two  contexts  are  comparable  if  either  (i),  (ii)  or  (iii)  is  satisfied.  Therefore  one  can  define 
only  partial  order  among  the  set  of  contexts  using  the  relation  C  and  form  a  context-lattice. 
A  lattice  is  a  partially  ordered  set,  with  X  C  Y  meaning  XflY  —  X  and  XuY=Y,  in  which 
each  pair  of  elements  possesses  a  greatest  lower  bound  and  a  least  upper  bound  within  the 
set.  If  X  C  Y  then  one  says  that  X  is  as  general  context  as  context  Y  [Sho91]. 

A  context  alone  is  not  interesting;  it  is  interesting  only  when  it  is  linked  with  all  asser- 
tions which  are  true  in  that  context.  Contexts  can  be  viewed  as  envelopes  enclosing  some 
assertions.  If  an  assertion  A  is  true  in  the  context  X  then  this  information  is  represented  as 
A^ .  This  assertion  may  be  true  in  one  context  <ind  untrue  in  another  context.  If  A^  is  valid 
and  if  y  C  X  then  A^  is  also  vjJid.  In  other  words,  if  an  assertion  A  is  true  in  a  particular 
context,  say  X,  then  it  is  true  in  all  contexts  which  are  more  general  than  X.  This  is  the 
basis  for  the  context  interchjinge  mechanism  presented  in  this  paper. 

3      Semantic  Assertions  and  Context-Knowledge 

Each  data-source  is  visualized  as  a  set  of  distinct  object  types  and  each  object  is  an  aggrega- 
tion of  a  set  of  properties.  We  categorize  properties  into  two  clttsses:  primitive  properties  and 
non-primitive  properties.  Semantics  of  primitive  properties  are  the  same  across  all  appUca- 


tions  and  data  sources,  whereas  the  semantics  of  a  non-primitive  property  may  be  different 
for  different  applications  and  at  different  data  sources.  The  semantics  of  a  non-primitive 
property  are  captured  through  a  context-assertion  lattice. 

The  context-assertion  domain  of  a  property  is  the  set  of  contexts  relevant  to  the  property 
and  a  set  of  assertions  which  may  be  true  in  each  context.  The  context-assertion  domain  of  a 
property  is  given  by  the  context-assertion  lattice  of  the  property.  This  lattice  is  constructed 
from  the  context  lattice  and  the  semantic-assertions  domain.  The  context  lattice,  the  se- 
mantic assertion  domain,  and  the  context  assertion  lattice  are  described  in  the  following 
subsections. 

3.1      Context-Lattice 

As  mentioned  earlier,  the  semantics  of  any  property  cannot  be  understood  in  isolation  of 
the  environment/context  in  which  it  is  intended  to  be  used.  In  other  words,  the  envi- 
ronment/context associated  with  a  property  functionally  determines  the  semantics  of  the 
property.  As  observed  in  [SSR92],  the  environment  of  a  property  can  be  occasioncdly  de- 
termined by  other  properties  of  the  object.  These  properties  are  collectively  referred  to  as 
the  environment  schema  of  the  non-primitive  property.  The  environment  schema  needs  to 
be  tagged  whenever  a  non-primitive  property  is  moved  from  one  environment  to  another. 
The  set  of  all  possible  environments/contexts  in  which  the  property  is  defined  is  called  the 
context  domain  of  the  property.  The  context  domziin  of  a  property  is  represented  by  the 
context  lattice. 

Let  Environment(P)  denote  the  environment  scheme  of  a  non-primitive  property  P.  Let 
Ej  be  a  property  in  Environment(P).  Let  Dom{Ej)  be  the  domain  or  the  set  of  legal  val- 
ues associated  with  Ej.  The  assignment  of  each  Cj  €  Dom(f^j)  to  the  property  Ej  sets  a 
different  context  for  the  non-primitive  property  P.  For  example,  let  Dom(Instrument-type) 
be  {  Equity,  Future  }  and  Dom(Exchange)  be  {Nyse,  Tokyo  }.  Assignment  of  Equity  to 
Instrument-type  sets  a  particulaj  context  to  property  Trade-price,  whereas  the  assignment 
of  Future  to  Instrument-type  sets  another  context  to  the  property  Trade-price.  Different 
contexts  associated  with  the  property  Trade-price  in  the  excimple  are  shown  in  Table-1. 


Table- 1 

Notation 

Context 

A 

Instrument-type='Equity' 

B 

Instrument- type='Future' 

C 

Exchange='Nyse' 

D 

Exchange =' Tokyo' 

We  define  the  base  context  set,  denoted  by  R,  for  a  non-primitive  property  P,  as  follows: 

R=  U  Dom{Ej) 

S,  6  Environment(  P ) 

The  powerset  of  R  generates  a  context-lattice  for  the  property  P. 

In  the  example,  R  is  given  by  the  set  {  A,  B,  C,  D  }.  The  power  set  of  R  generates  the 
context-lattice  for  Trade-price.  The  context-lattice  generated  for  the  Trade-price  is  shown  in 
Figure  2.  Crossed  out  nodes  in  the  context-lattice  are  called  no-good  nodes.  In  other  words 
such  contexts  are  not  applicable  to  the  particular  property. 

As  mentioned  earlier,  contexts  are  interesting  only  when  they  are  linked  with  all  assertions 
which  are  true  in  that  context.  The  issue  of  semantic  assertions  of  a  non-primitive  property 
is  discussed  in  the  next  subsection. 

3.2      Semantic  Assertions 

The  semantic- assertions  of  each  property  can  be  defined  using  a  finite  set  of  parameters 
called  meta-properties.  Consider  a  non-primitive  property  P.  Let  Meta-Properties(P)  be  the 
set  of  meta-properties  aissociated  with  the  property  P.  For  example,  the  property  Trade-price 
may  have  two  meta-properties  'Currency'  and  'Status'. 

Meta-Properties(Trade-price)  =  {  Currency,   Status  } 

Each  meta-property  is  associated  with  a  finite  set  of  meta-values.  Let  MPi  be  a  meta 
property  of  property  P.  Let  Met  a- Values  (M  Pi)  be  the  set  of  meta  values  associated  with  the 
meta-property  MPi.  For  example,  the  meta-property  'Currency'  may  have  two  meta-vcdues 
'DoUars'  and  'Yens'. 

Meta- Values  (Currency)  =  {  DoUars,    Yens  } 
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Figure  2:  Context-lattice  of  Trade-price  (Refer  to  Table- 1  for  definitions  of  A  through  D) 


In  some  context,  say  'X',  the  meta-property  MP,  may  have  a  meta- value  MVi;  in  another 
context,  it  may  have  a  meta-value  MV2  and  so  on.  For  example,  the  meta-value  of  currency 
may  be  Dollars  in  one  context  and  Yens  in  another  context.  This  implies  that  two  semantic 
assertions  are  possible  for  the  meta-property  Currency.  Similarly,  if  one  assumes  that  Meta- 
Values(Status)  is  given  by  {  'Latest-price',  'Closing-price'  },  then  semantic  assertions  for 
'Trade-price'  will  be  as  shown  in  Table-2. 


Table-2 

Notation 

Semantic  Assertions 

E 

Currency =' Dollars' 

F 

Currency =' Yens' 

G 

Status='Latest-price' 

H 

Status='Closing-price' 

Let  the  set  of  semantic-assertions  associated  with  the  meta  property  MPi  be  denoted 
by  Assertions(MPi).  The  semantic  assertion  domain  of  property  P  is  defined  by  the  cross 
product  of  semantic  cissertions  associated  with  each  meta-property  of  P.  Let  N  be  the  number 
of  meta- properties  associated  with  the  property  P.  The  semantic  assertion  domain  of  P  can 
be  formally  written  as: 

Semantic-Assertion-  Domain(P)  = 

Assertions(AfPi)    x   Assertions(MP2)   x...x   Assertions(M/'Ar) 
In  the  example,  the  semantic-assertion  domeun  of  the  property  'Trade-price'  is  given  by: 
Semantic- Assertion- Domain(Trade-price) 

=  Assertions(Instrument-type)  x  Assertions( Exchange) 
=  {  E,  F  }  X  {  G,  H  } 
=  {EG, EH,  FG,  FH  } 
=  {  EG,  EH,  FG,  FH  } 
In  the  above  expression,  EG  means  E  A  G,  where  A  is  the  boolean  'and'  operator;  similar 
notation  holds  for  other  terms.  If  two  semantic  assertions,  say  E  and  F  are  convertible,  then 
EH  and  FH  are  also  convertible.   For  example,  if  Trade-price  in  Dollars  is  convertible  to  a 
Trade-price  in  Yens  then  a  Trade-price  which  is  Latest-price  in  Dollars  is  cdso  convertible  to 
a  Trade-price  which  is  Latest-price  in  Yens. 
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3.3      Context-Assertion  Lattice  of  a  Non-primitive  Property 

The  context-assertion  lattice  of  a  property  is  generated  by  the  cross-product  of  the  context- 
lattice  and  the  semantic  assertion  domain.  Therefore,  the  context-assertion  lattice  couples 
the  contexts  of  a  property  with  its  associated  assertions.  As  an  example,  the  context-assertion 
lattice  of  Trade-price  is  as  shown  in  Figure  3. 

A  node  n,  in  the  context-assertion  lattice  is  a  pair  denoted  by  (i,,y,).  The  first  co-ordinate 
projection  (denoted  by  PJ\)  is  called  the  context  co-ordinate  and  the  second  co-ordinate 
projection  (denoted  by  PJ2)  is  Ccilled  the  semantic  co-ordinate.  These  two  co-ordinates  are 
defined  as  follows: 

PJi{n,)  -^  Xi 

PJ2{n,)  -^  yi 
The  relations  C  and  D  among  two  nodes  rig  and  nj  in  the  context-lattice  are  defined  as 
follows: 

"a  Q  "d  provided  PJ\{na)  C  PJi{nd)  and  PJ2("a)  =  PJiind)- 

ria  D  n<J  provided  PJ\{na)  D  PJ\{nd)  and  PJzina)  —  •P./2(^d)- 
Given  any  two  nodes,  Ua  and  n^  in  the  context-assertion  lattice,  one  can  define  three  relations 
as  follows: 

Total  context-lift:  The  context  of  node  n^  can  be  totally  lifted  to  the  context  of  node  rio, 
provided  714  C  ng. 

Partial  context-lift:   The  context  of  node  n^  can  be  partially  lifted  to  the  context  of  a 
node  rio,  whenever  n^  D  no. 

No  context-lift:  The  context  of  node  n^  cannot  be  lifted  to  another  node  Ua,  if  the  context 
of  nj  cajinot  be  either  totally  or  partially  lifted  to  the  context  of  rig. 

Node  Equivalence:  Given  a  node  nj  in  the  context-assertion  lattice,  we  define  a  node- 
equivalence  class  denoted  by  [nj]  as  follows: 

[nj]  =  {  Tio   I  rio  is  in  the  context- assertion  lattice  amd  PJi(nj)  =  PJi{na)  and  PJ2{'"-d) 
is  convertible  to  PJ\{na)  } 
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If  the  context  of  any  node  Uk    G  [  nj]  can  be  lifted  to  the  context  of  n^  then  every  node 
^/    6  [  n<i  ]  can  be  lifted  to  the  context  of  n^. 

The  context  and  its  associated  semantic  assertions  of  any  instance  of  P  can  be  given  by  one 
of  the  nodes  of  the  context-assertion  lattice.  Our  model  requires  a  context-assertion  lattice 
for  each  non-primitive  property  that  is  being  shared  (interoperable)  among  applications  and 
data-sources.  The  use  of  context-assertion  lattice  in  describing  the  context  knowledge  of  a 
data-source  or  an  application  is  discussed  in  the  next  section. 

4      Context  Definition  Repository 

In  the  proposed  context  interchange  architecture,  each  data-source  or  application  needs 
to  have  a  Context  Definition  Repository(CDR).  Semantic  assertions  and  their  associated 
contexts  for  each  and  every  non-primitive  property  of  a  data  source  collectively  constitute 
the  CDR  for  the  data-source.  Similar  definition  holds  for  an  application's  CDR. 

Consider  a  database,  dbl,  which  supplies  the  instance  values  of  the  property  Trade- 
price.  Assume  that  the  instances  of  Trade-price  are  defined  in  three  different  contexts, 
namely  Cl,  C2,  and  C3.  The  context  CI  is  defined  by  Instrument-type  =  'Equity'  and 
Exchange  =  'Nyse',  the  context  C2  is  defined  by  Exchange  =  'Tokyo',  and  the  context  C3 
is  defined  by  Instrument-type  =  'Future'.  Let  the  semantic  assertions  Status=  'latest-price' 
and  Currency  =  'DoUars'  are  true  in  context  Cl,  and  Status  =  'Closing-price'  and  Currency 
=  'Yens'  are  true  in  C2,  and  finally  Status  =  'Closing-price'  and  Currency=  'Dollars'  are 
true  in  context  C3.  These  contexts  Cl,  C2,  C3  and  their  respective  semantic  assertions  can 
be  mapped  to  nodes  n21,  n20,  and  n9  respectively  in  the  context  assertion  lattice  of  property 
Trade-price.  The  collection  of  all  contexts  and  semantic  assertions  which  are  true  in  these 
contexts  for  a  non-primitive  property  in  a  particular  data-source  forms  the  characterizing 
environment  for  the  non-primitive  property  at  that  particular  data-source.  Now  one  can 
define  the  characterizing  environment  of  the  Trade-price  in  the  data-source  dbl  cis: 

7f,U-pr.c.  =  {"21,  n20,  n9} 
As  such,  the  context  data  repository  of  a  data-source  can  be  described  by  defining  the 

12 


•o  3  •• 


rj 


(0  u^ 

e  g 


(9 


«M    .  y 


l<     if 


e 


s 


"  ^         I 


at  (9 


(M  E 


Sif 


N.  O 


,'~| 


10  z 

«2 


^i?r 


/"  i 


^-J 


-  g 

Mas 
e  t:  u 
—  83 

513 


i  is 


ri 


1  r 


4) 


c4 


V 


(4 

a 
o 


V 
OS 
CO 

ca 

■•> 

X 
u 

■•» 
C3 
O 


« 


characterizing  environments  for  each  and  every  non-primitive  property  that  can  be  supplied. 
The  collection  of  such  characterizing  environments,  one  for  each  non-primitive  property 
relevant  to  a  data-source  forms  the  Context  Definition  Repository  (CDR)  for  that  data- 
source.  Similarly,  the  characterizing  environments  for  each  and  every  non-primitive  property 
required  for  the  application  constitute  the  Context  Definition  Repository  (CDR)  for  that 
application. 

5      Context  Comparison 

One  aim  of  context  comparison  is  to  ensure  that  applications  receive  only  meaningful  data 
from  data-sources.  To  accomplish  this,  the  query  processor  must  analyze  the  query,  identify 
all  the  relevant  nodes  in  the  CDR  of  the  application,  compare  these  nodes  with  the  nodes 
in  the  CDRs'  of  data  sources,  and  identify  potential  sources  which  can  supply  meaningful 
data  to  the  application.  Checking  for  these  relations  at  run-time,  that  is  during  the  query 
evaluation  time  implies  more  burden  on  the  query  processor.  Therefore,  the  context  compar- 
ison task  is  ideally  delegated  to  context-mediator.  The  context-mediator  can  make  certain 
comparisons  in  advance  and  the  query-processor  can  utilize  the  results  of  these  comparisons 
at  query  evaluation  time. 

In  this  scenario,  the  goal  of  context-mediator  can  be  conceptually  outlined.  The  context- 
mediator  is  provided  with  a  set  of  CDRs  of  databases  and  applications.  The  task  of  the 
context- mediator  is  to  efficiently  determine  the  subset  of  databases  whose  context  can  be 
lifted  to  an  application's  context.  Efficiency  can  be  achieved  by  taking  advantage  of  two 
important  factors.  First,  the  context-mediator  can  be  supplied  with  changes  in  the  CDR 
of  the  relevant  application  or  database,  allowing  the  context-mediation  process  to  be  incre- 
mentttl,  with  updates  for  the  chamged  contexts  only.  Second,  the  query-processor  usuadly 
needs  to  know  only  which  data-source  can  supply  meaningful  data  to  an  application's  query, 
and  only  rarely  the  contents  of  CDR  of  a  data-source.  The  context  mediator  can  therefore 
be  constructed  as  an  intermediate  data  structure  which  can  make  very  rapid  comparisons 
of  contexts.    This  data  structure  must  also  help  the  context-mediator  in  identifying  and 
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correcting  the  pre-compiled  comparisons,  whenever  changes  are  made  to  the  existing  CDRs. 

The  context-mediator  associates  every  node  in  the  context-assertion  lattice  with  two  lists: 
a  Source-list  and  a  Consumer-list.  A  Source-list  states  all  the  databases  which  contain 
the  same  node  in  their  respective  CDRs.  An  consumer-list  states  all  the  applications  which 
are  having  the  same  node  in  their  respective  CDRs.  These  lists  are  the  same  for  all  equivalent 
nodes  in  the  context-assertion  lattice. 

To  demonstrate  context  comparison,  consider  three  applications  apl,  ap2,  and  ap3,  whose 
data  requirements  span  over  data-sources  dbl,  db2,  and  db2.  Assume  that  applications  apl, 
ap2,  and  ap3  require  the  instances  of  the  non-primitive  property  'Trade-price'  in  different 
contexts  and  further  that  'Trade-price'  is  available  in  dbl,  db2,  and  db3  in  different  contexts. 
The  semantics  of  'Trade-price'  in  different  applications  and  in  different  databases  are  given 
by  characterizing  environments  in  their  respective  CDRs. 

7f.U-p,.«  =  {"21,  n20,  n9} 

7f.l<^-p,.„  -  {n20,  n22} 

iTride-prrce  =  ("21,  n32} 
7rrL-pr.ce  =  {"21,  7x33} 
7rr!de-pr.«    =   {"13} 

Consider  the  context-zissertion  lattice  for  'Trade-price'  shown  in  Figure  3.  Using  the 
above  given  characterizing  environments  of  'Trade-price',  source-list  and  consumer-list  for 
each  node  in  the  context- assertion  lattice  can  be  constructed.  As  an  example,  consider  node 
n21  in  the  context-assertion  lattice  of  'Trade-price'.  The  element  dbl  is  entered  into  the 
Source-list  and  elements  apl  and  ap2  are  entered  into  the  Consumef-list  of  the  node  n21. 
The  same  source-list  and  consumer-list  will  be  attached  to  adl  nodes  which  are  eqiiivalent 
to  n21,  i.e.,  nodes  like  n23.  In  Figure  3,  for  each  node,  an  arrow  that  is  leaving  the  node 
with  broken  line  shows  the  source-list  and  the  consumer-list  associated  with  the  node.  For 
simplicity,  only  non-empty  source  and  consumer  lists  are  shown  in  this  figure. 

With  the  help  of  Source-list  and  Consumer-list,  the  context  mediator  can  identify  poten- 
tial sources  which  can  provide  semanticsdly  meaningful  solution  to  am  application's  query. 


15 


The  context  mediator  will  become  a  performance  bottleneck  if  every  application  needs  to 
consult  the  context  mediator  for  each  of  its  queries.  To  avoid  this  performance  bottleneck, 
the  context-mediator  is  envisaged  to  distribute  the  required  knowledge  of  context  mediation 
among  CDRs'  of  applications.  Every  node  in  the  CDR  of  an  application  is  augmented  with 
two  disjoint  Usts,  namely  Total-Suppliers-List  and  Particd-Supphers-List.  The  derivation  of 
Total-Suppliers- List  and  Partial-Suppliers-List  is  discussed  in  the  following  paragraphs. 

Total-Suppliers-lists:  The  total-supplier-list  contains  all  the  data-source  nodes  whose  con- 
text can  be  totally  lifted  to  the  context  of  an  application  node.  The  union  of  Source-lists 
associated  with  each  node  present  in  each  chain  and  reaching  a  particular  application  node 
in  the  context  assertion  lattice  forms  the  Total-suppliers-list  for  the  application  node.  These 
chains  aJso  include  the  application  node 

Partial-Suppliers-list:  The  Pajtial-Supplier-List  gives  all  the  data-source  nodes  which 
can  only  be  particdly  lifted  to  the  application  node.  The  union  of  Source-lists  of  each  node 
present  in  each  chain  leaving  the  application  node  in  the  context  assertion  lattice  forms  the 
Partial-Support-List  for  the  application  node.  These  chains  exclude  the  application  node. 

Consider  node  n21  in  the  CDR  of  application  apl.  There  are  two  chcdns  reaching  this 
node  in  the  context  assertion  lattice,  namely  {n21,  nl3,  nl,  nO}  and  {n21,  n5,  nl,  nO}.  The 
union  of  source-lists  of  jdl  nodes  in  these  chains  is  {  dbl  }.  Therefore  the  Total-Suppliers-List 
of  n21  of  application  apl  will  be  {  dbl  }.  Since  there  aie  no  chains  leaving  n21,  the  Partial- 
Support-List  of  n2I  in  the  CDR  of  apl  will  be  empty.  Similarly,  for  node  n32,  there  are  two 
chains  reaching  this  node,  namely  {  n32,  n20,  n4,  nO  }.  Therefore  the  Total-Support-List  of 
n32  will  be  {  dbl,  db2  },  and  since  there  are  no  chains  leaving  n21,  the  Partial-Suppliers-List 
will  be  empty. 

In  this  scenario,  an  application  can  determine  the  list  of  potential  sources  which  can 
supply  semantically  meaningful  solution  to  its  query. 
iT'r'a^-^.c.  =  {{n2l{dbl}{}),  {n22{dbl,db2}{})} 

If  the  application  needs  data  related  to  'Trade-price'  in  the  context  n21  then  it  will  select 
dbl,  however,  if  the  'Trade-price'  data  needed  tire  in  the  context  n32  then  either  dbl  or  db2 
is  selected. 
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Similarly,  one  can  write  the  characterizing  environments  of  Trade-price  in  api  and  ap2  with 
Total-Suppliers-lists  and  Partial-Suppliers-lists. 


7t^:..-p..„  =  {{n22{dbl}{}),  (n34{  dbl}{  })}{db\} 


frrl.-^,..  =  {{nlH}{dbl,db3})}{} 

From  the  above  characterizing  environments,  applications  ap2  and  ap3  can  select  data- 
sources  which  are  compatible  to  the  required  context  of  the  property  'Trade-price'. 

The  issue  that  still  needs  to  be  addressed  is  the  process  of  ensuring  the  correctness  of 
Total-Suppliers-list  and  Partial-Suppliers-lists  that  are  distributed  among  CDRs  of  several 
applications,  in  the  event  of  changes  in  semantics  either  at  data  source  level  or  at  application 
level.  This  issue  is  addressed  in  the  next  subsection. 

5.1      Integrity  of  Semantic  Mappings 

Semantics  of  a  non-primitive  property  can  change  either  at  an  application  or  at  a  data 
source.  A  mechanism  to  cope  with  changes  in  the  semantics  of  a  non-primitive  property  in 
the  proposed  context  interchange  architecture  is  discussed  in  the  following  paragraphs. 
Changes  in  Application  Data  Semantic:  Whenever  the  semantics  of  a  non  primitive 
property  in  an  application  chsinge,  these  changes  must  be  reported  to  the  context  media- 
tor. The  context  mediator  updates  consumer-lists  of  affected  nodes  in  the  context  assertion 
lattice  and  using  this  context  assertion  lattice  generates  a  new  list  of  Total-Suppliers-lists 
and  Partial-Suppliers-lists  for  all  new  nodes  in  the  characterizing  environment  of  the  non- 
primitive  property  in  CDR  of  the  application.  The  application  should  not  request  for  data 
related  to  the  non-primitive  property  until  the  context  mediator  returns  a  new  set  of  Toted- 
Suppliers-lists  and  Partial-Suppliers-lists  to  the  set  of  new  nodes  entered  into  the  application 
CDR. 

Changes  in  Data  Source's  Data  Semantics:  Whenever  the  semantics  of  a  non  primitive 
property  at  a  data  source  chsmge,  such  chsmges  will  be  reported  to  the  context  mediator.  The 
context  mediator  updates  source-lists  of  nodes  in  the  context  assertion  lattice  corresponding 
to  the  old  and  new  contexts  of  the  non-primitive  property.  Concurrently,  the  context  medi- 
ator identifies  the  set  of  affected  applications  using  consumer-lists  attached  to  nodes  in  the 
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context  assertion  lattice,  evaluates  new  Total-Suppliers-lists  and  Partial-Suppliers-lists,  and 
updates  the  CDR  of  each  affected  application.  If  a  data-source  changes  the  semantics  of  a 
non-primitive  property,  then  it  should  not  allow  any  application  to  access  this  property  until 
the  context  mediator  resolves  all  semantic  mismatches  by  updating  the  CDRs  of  all  affected 
applications. 

In  the  above  discussion,  a  mechanism  for  semantic  interoperability  between  an  application 
and  a  data-source  with  respect  to  a  single  non-primitive  property  is  discussed.  In  fact,  an 
application  query  may  consist  of  one  or  more  non-primitive  properties  and  may  have  some 
primitive  properties.  In  such  a  situation,  identification  of  data-sources  which  can  supply 
a  semantically  meaningful  solution  to  an  application  query  in  a  cost-effective  manner  is  a 
challenging  ta^k.  In  the  following  section,  we  proposed  an  algorithm  to  identify  a  set  of 
data-sources  which  can  supply  semantically  meaningful  data  to  an  application  query. 

6      Query  Processing 

An  application  generates  a  query  based  on  its  data  requirements.  Each  query  consists  of  two 
parts:  the  target  part  and  the  qualification  part.  An  application  query  can  be  interpreted  as  a 
request  for  the  instances  of  the  target  properties  within  the  context  given  by  the  qualification 
of  the  query.  The  semantics  of  the  properties  referred  to  in  the  query  can  be  determined 
by  the  CDR  of  the  qualification  of  the  query.  Therefore,  one  needs  to  identify  semantic 
assertions  which  are  true  in  the  context  given  by  the  qualification  part  of  the  query  and  the 
CDR  of  the  application. 

The  context  of  a  non-primitive  property  in  the  query  can  be  defined  only  if  the  query 
is  well-formed  at  the  application.  We  define  a  query  to  be  well-formed  with  respect  to  a 
non-primitive  property  at  an  application,  if  the  semantics  of  the  non-primitive  property  in 
the  query  is  derivable  from  the  qualification  of  the  query  and  the  CDR  of  the  application. 
In  other  words,  the  quaJification  of  the  query  must  be  complete  enough  to  determine  the 
semantic  assertions  of  a  non  primitive  property  referred  to  in  the  query.  A  well-formed 
query  with  respect  to  a  non-primitive  property  at  a  data  source  can  be  defined  in  a  similar 
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manner.  Once  semantic  assertions  of  properties  referred  to  in  the  query  are  determined, 
then  the  query  processor  identifies  a  set  of  sites  which  can  supply  semantically  meaningful 
solution  to  the  query.  In  the  following  subsection,  we  provide  an  algorithm  to  identify  a  set 
of  such  sites. 

6.1       Site-selection  Algorithm 

The  process  of  site-copy  selection  is  a  mechanism  for  selecting  a  set  of  data  sources  for 
processing  a  given  query  if  more  than  one  candidate  set  are  available.  The  site-copy  selection 
algorithm  in  MERMAID  [Te87]  concentrated  on  optimization  of  query  execution  cost  and 
did  not  consider  semantic  heterogeneity  aspects  during  the  selection  of  processing  sites,  which 
are  instead  resolved  by  the  integrated  schema  in  MERMAID  architecture.  The  MERMAID 
algorithm  selects  the  set  consisting  of  the  minimum  number  of  sites  to  process  the  query 
out  of  all  possible  sets  of  candidate  sites.  The  rationale  behind  the  minimum  number  of 
sites  is  that,  in  general,  the  data  communication  requirements  are  likely  to  be  minimum  if 
the  number  sites  involved  in  the  query  processing  are  minimum,  which  in  turn  reduces  the 
overall  query  processing  cost.  Since  there  is  no  integrated  schema  in  our  context  interchange 
architecture,  the  Site-copy  selection  algorithm  must  deal  with  semantic  heterogeneity  issues. 
The  algorithm  below  consideres  such  issues  during  the  selection  of  processing  sites. 

The  following  algorithm  selects  a  set  of  candidate  sites  for  a  given  query  which  can  supply 
meaningful  solution  to  the  query.  The  Total-Suppliers-List  and  Pcirtial-Suppliers-List  used 
in  the  following  algorithm  are  defined  in  the  previous  section. 
Candidate-sites(query,  CDR(application)) 

{ 
Let  P  be  the  set  of  primitive  properties  and  X  be  the  set  of  non-primitive 

properties  referred  to  in  the  query; 

/*  see  Example- 1  and  Example- 2  below  */ 

For  each  pi  €  P,  let  SPi  be  the  set  of  data-sources  where  pi  is  available; 

Represent  SPx  in  disjunctive  normal  form. 

For  each  Xj  £  X ,  whose  context  is  given  by  n,^ 
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/*  see  Example- 1  below  */ 

{  if  total-suppliers  list  of  n^    is  not  empty 

{  let  SXj  denote  the  Totcd-Suppliers-List  of  n^  ; 
Represent  SXj  in  disjunctive  normal  form  } 
/*  see  Example- 2  below  */ 
else  if  Partial-Suppliers-List  of  n^    is  not  empty 

{  let  SXj  denote  the  collection  of  Particd-Support-List  of  n^  ; 
Represent  each  Sji  €  SXj  in  disjunctive  normal  form 
and  assign  their  conjunction  to  SXj  } 

} 
Set  Candidate-Sites  to  [Mp^^pSPi)  A  {\/j,^^xSXj)\ 

Translate  the  expression  Candidate- Sites  into  disjunctive  normal  form; 

select  the  conjective  term  from  Candidate-Sites  which  heis  minimum  number  of  sites 

and  if  more  than  one  such  term  exists  then  select  any  one  of  them 

and  return  the  selected  term. 

} 

Examples: 

The  following  examples  illustrate  the  selection  of  candidate  sites  to  process  a  given  query 
by  taking  semantic  heterogeneity  issues  into  consideration. 
Example- 1:  Assume  that  the  following  query  is  received  from  the  application  apl. 

Select  Trade-price 

Where  (Elxchange=Nyse  and  Instrument-type=Equity). 

The  query  is  well-formed  at  the  application  apl. 

P  =  {  Exchange,  Instrument-type  } 

X  =  {  Trade- price  } 

SPsxchanae  =  dbl  V  db2  V  dbZ 

S Plnstrument-Type    =  ^bl  V  db2  V  db3 
S  Xxrade- price   =  dbl 
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Ceindidate  sources  for  processing  the  complete  subquery  is  given  by 
(dbl  V  db2  V  db3)  A  {dbl  V  db2  V  db3)  A  (dbl) 

The  above  expression  is  translated  into  disjunctive  normal  form  as  follows: 
(dbl)  V  {dbl  A  db2)  V  {dbl  A  db3)  V  ((f6l  A   db2  A  </63) 

Since  the  first  disjunction  has  minimum  number  of  elements,  Data-source  dbl  wiU  be 
selected  to  process  the  query. 

Example-2:  Assume  application  ap3  uses  the  query: 
Select  Trade-price 
Where  Exchange=Nyse 

The  query  is  well-formed  at  the  application  ap3. 
P=  {  Exchange  } 
X  =  {  Trade-price  } 

SPsxchange   =  dbl  V  db2  V  dbZ 

Since,  Trade-price  in  the  application  ap3  has  no  complete  solution  at  any  single  data- 
source.  It  hcis  only  partial  solution  at  dbl  and  at  db3. 

SXTrade-pr^ce    =   {dbl)  A  {db3) 

Candidate  sites  to  process  the  query  is  given  by 

{dbl  V  db2  V  dbZ)  A  {{dbl)  A  {dbZ)) 

The  above  expression  is  translated  into  disjunctive  normal  form  as  follows: 

{dbl  A  dbZ)  V  {db2  A  dbl  A  dbZ) 

Data-sources,  dbl  amd  db3  c&n  be  selected  to  process  the  query. 

7      Conclusion 

In  this  paper,  we  provided  a  lattice-based  framework  for  the  description  of  context  knowledge 
for  data-sources  and  applications.  Since  hierarchical  nature  is  implicit  among  disparate 
contexts,  the  lattice  model  is  ideally  suited  for  the  required  representation.  Our  approach 
requires  that  a  context  data  repository(CDR)  of  an  application  and  another  of  a  data-source 
be  generated  from  the  given  set  of  context-lattices.  In  other  words,  all  CDRs  must  share  the 
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same  set  of  context-lattices.  Context  comparison  and  context  evolution  tasks  can  be  handled 
efficiently  using  the  lattice  model.  The  lattice  technique  for  representing  context  knowledge 
is  more  appropriate  since  it  involves  only  set  operations  for  their  context  comparison,  unlike 
as  rule  based  representation  which  uses  inference  mechanism  to  perform  context  comparison. 

The  context-assertion  lattice  presents  a  set  of  legal  contexts  and  associated  assertions 
which  can  be  used  as  a  reference  set  for  a  Database  Administrator  or  an  application  developer 
to  frame  his  or  her  CDRs.  If  the  application  developer  has  the  option  to  choose  between 
semantic  assertions  then  he  or  she  can  use  the  existing  context-assertion  lattice  to  identify 
what  kind  of  assertion  would  receive  greater  database  support.  Similarly  a  database  designer 
can  use  the  context-assertion  lattice  to  fix  semantic  assertions  in  his  or  her  data  so  that  they 
can  be  utilized  meaningfiilly  by  many  applications. 

Our  model  is  powerful  enough  to  accommodate  semantic  conversions  during  context 
interchange  through  the  notion  of  node  equivalence.  If  two  nodes  in  a  context  assertion  lattice 
are  equivalent  then  all  applications  associated  with  one  of  the  nodes  can  get  semantically 
meaningful  data  from  the  data  sources  associated  with  the  other  node  and  vice  versa. 

We  also  derived  conditions  under  which  a  source  can  supply  meaningful  data  to  an 
application.  Under  these  conditions,  the  context-mediator  maintains  the  consistency  of  the 
knowledge  present  in  the  application  as  well  as  the  data-source  CDRs.  The  Query-processor, 
using  the  knowledge  present  in  the  CDR  of  the  application,  can  identify  a  set  of  data-sources 
which  can  supply  me!Lningful  solution  to  the  application  query.  As  such,  the  roles  of  context- 
mediator  and  the  query- processor  are  separated.  A  context  driven  data-source  selection 
algorithm  for  a  given  application  query  was  also  described. 

Since  the  storage  requirement  for  each  context- assertion  lattice  can  be  large,  the  lattice 
must  be  pruned  to  its  minimum  size  before  it  is  physically  stored.  All  nodes  in  a  context- 
assertion  lattice,  which  <ire  not  present  in  any  CDR  can  be  pruned  from  the  lattice.  Since  the 
lattice  is  a  directed  acyclic  graph,  well  defined  storage  representation  schemes  ajid  algorithms 
to  include  a  given  node  into  the  lattice  and  to  search  for  a  required  node  in  the  lattice  are 
available. 
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